Skip to content

Document regr_r2 coefficient of determination aggregate#449

Merged
javier merged 6 commits into
mainfrom
sm_regr_r2
Jun 4, 2026
Merged

Document regr_r2 coefficient of determination aggregate#449
javier merged 6 commits into
mainfrom
sm_regr_r2

Conversation

@jovfer
Copy link
Copy Markdown
Contributor

@jovfer jovfer commented May 15, 2026

Pairs with questdb/questdb#7104 — the implementation PR for the new regr_r2(y, x) aggregate.

Summary

  • Add a regr_r2 section to the finance functions reference, alphabetically between regr_intercept and regr_slope.
  • Add a corresponding entry to the May 2026 reference changelog.

What the section covers

  • One-line definition (coefficient of determination, 0-1 scale).
  • Behavior bullets: null when X is constant (covers the single-row case), 1.0 when Y is constant and X varies (per SQL:2003), supported types, argument order, and the divergence from corr(y, x) at the constant-Y edge.
  • Math: the centered-moments form $r^2 = S_{xy}^2 / (S_{xx} \cdot S_{yy})$, with a note on the $S_{xx} = 0$ and $S_{yy} = 0$ branches.
  • Three worked examples mirroring the surrounding sections:
    • A fleet-telemetry trend-detection query that uses regr_slope plus regr_r2 > 0.7 as a significance filter.
    • The shared measurements table, with $r^2 = 0.81$.
    • A GROUP BY example over sales_data showing strong per-category linear fit.
  • A null-handling example confirming that pairs with either argument null are skipped.

Tradeoffs and notes

  • The page reuses the existing measurements and sales_data example tables to keep the regression section coherent. The R-squared values for those tables are reported as computed, not hand-picked for visual appeal.
  • The divergence from corr() at $S_{yy} = 0$ is called out explicitly. Users who reach for regr_r2 expecting strict mathematical equivalence with corr^2 will notice this one boundary; the alternative (returning NaN there) would diverge from the SQL standard.
  • The fleet-telemetry example references motor_temp and elapsed_seconds columns to match the roadmap motivation (Statistical SQL functions for fleet-wide and cohort analysis roadmap#120 gap 3). The query is illustrative; the page does not provide a runnable demo table for it.
  • This PR documents only regr_r2. The other six standard regression aggregates listed in the same roadmap gap (regr_count, regr_avgx, regr_avgy, regr_sxx, regr_sxy, regr_syy) are not implemented yet and not documented here.

Test plan

  • Build the docs locally (yarn start) and visit /docs/query/functions/finance/#regr_r2 to verify rendering.
  • Confirm KaTeX renders the centered formula correctly.
  • Verify the changelog entry links to the new anchor.
  • Cross-check the worked-example values against the implementation in questdb/questdb#7104.

Add a regr_r2(y, x) entry to the finance functions reference,
placed alphabetically between regr_intercept and regr_slope.

The section covers what R-squared measures, supported types,
argument order, and the SQL:2003 edge cases:
- null when X is constant (Sxx = 0)
- 1.0 when Y is constant and X varies (Syy = 0)

It also notes the divergence from corr(y, x), which returns null
in the constant-Y case rather than 1.0.

Three worked examples mirror the surrounding sections: a fleet
telemetry trend-detection query (slope + R-squared as a
significance filter), a basic dataset with R-squared = 0.81, and
a GROUP BY example over sales_data showing a near-perfect linear
fit per category. A null-handling example confirms that pairs
with either argument null are skipped.

Also add the function to the May 2026 reference changelog entry.

Pairs with questdb/questdb#7104.
@jovfer jovfer added documentation Improvements or additions to documentation enhancement New feature or request labels May 15, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 15, 2026

🚀 Build success!

Latest successful preview: https://preview-449--questdb-documentation.netlify.app/docs/

Commit SHA: b7a909e

📦 Build generates a preview & updates link on each commit.

@jovfer jovfer marked this pull request as ready for review May 27, 2026 08:57
Copy link
Copy Markdown
Collaborator

@javier javier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor nit with the order and y and x, but we need to change the SQL examples

Comment thread documentation/query/functions/finance.md Outdated
jovfer and others added 4 commits June 3, 2026 17:31
Address @javier review on #449:

- The fleet-telemetry example was not valid QuestDB SQL (fictional table,
  HAVING on aggregates, now() - 7d interval arithmetic). Replace all examples
  with queries against the demo instance `trips` table, each verified on
  https://demo.questdb.io with its exact result.
- Add a grouped fit-quality example over `payment_type` (R-squared as an
  anomaly lens) and a null-handling example with a non-trivial result.
- Fix the y/x argument order in the supported-types and means bullets for
  consistency with the regr_r2(y, x) signature.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This is the finance section, so use finance data and mirror the neighbouring
regr_slope / regr_intercept example structure.

- Replace the NYC-taxi `trips` examples with two examples on the demo
  `market_data_ohlc_1d` FX table, both using regr_r2(close, open) so the basic
  and grouped examples share one regression like the neighbours do.
- The grouped example uses GROUP BY symbol (matching regr_slope) and surfaces a
  real signal: the pegged USDHKD scores far lower than the floating majors.
- Keep it to two focused examples; null handling is already covered in the
  function's bullet list.

All values verified on https://demo.questdb.io.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@javier javier merged commit 39c32f9 into main Jun 4, 2026
3 checks passed
@javier javier deleted the sm_regr_r2 branch June 4, 2026 11:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants